{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cb1537e6",
   "metadata": {},
   "source": [
    "# Using Weaviate with Generative OpenAI module for Generative Search\n",
    "\n",
    "This notebook is prepared for a scenario where:\n",
    "* Your data is already in Weaviate\n",
    "* You want to use Weaviate with the Generative OpenAI module ([generative-openai](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai)).\n",
    "\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f1a618c5",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "This cookbook only coveres Generative Search examples, however, it doesn't cover the configuration and data imports.\n",
    "\n",
    "In order to make the most of this cookbook, please complete the [Getting Started cookbook](./getting-started-with-weaviate-and-openai.ipynb) first, where you will learn the essentials of working with Weaviate and import the demo data.\n",
    "\n",
    "Checklist:\n",
    "* completed [Getting Started cookbook](./getting-started-with-weaviate-and-openai.ipynb),\n",
    "* crated a `Weaviate` instance,\n",
    "* imported data into your `Weaviate` instance,\n",
    "* you have an [OpenAI API key](https://beta.openai.com/account/api-keys)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36fe86f4",
   "metadata": {},
   "source": [
    "===========================================================\n",
    "## Prepare your OpenAI API key\n",
    "\n",
    "The `OpenAI API key` is used for vectorization of your data at import, and for running queries.\n",
    "\n",
    "If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
    "\n",
    "Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43395339",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export OpenAI API Key\n",
    "!export OPENAI_API_KEY=\"your key\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88be138c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test that your OpenAI API key is correctly set as an environment variable\n",
    "# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n",
    "import os\n",
    "\n",
    "# Note. alternatively you can set a temporary env variable like this:\n",
    "# os.environ[\"OPENAI_API_KEY\"] = 'your-key-goes-here'\n",
    "\n",
    "if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
    "    print (\"OPENAI_API_KEY is ready\")\n",
    "else:\n",
    "    print (\"OPENAI_API_KEY environment variable not found\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91df4d5b",
   "metadata": {},
   "source": [
    "## Connect to your Weaviate instance\n",
    "\n",
    "In this section, we will:\n",
    "\n",
    "1. test env variable `OPENAI_API_KEY` – **make sure** you completed the step in [#Prepare-your-OpenAI-API-key](#Prepare-your-OpenAI-API-key)\n",
    "2. connect to your Weaviate with your `OpenAI API Key`\n",
    "3. and test the client connection\n",
    "\n",
    "### The client \n",
    "\n",
    "After this step, the `client` object will be used to perform all Weaviate-related operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc662c1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import weaviate\n",
    "from datasets import load_dataset\n",
    "import os\n",
    "\n",
    "# Connect to your Weaviate instance\n",
    "client = weaviate.Client(\n",
    "    url=\"https://your-wcs-instance-name.weaviate.network/\",\n",
    "    # url=\"http://localhost:8080/\",\n",
    "    auth_client_secret=weaviate.auth.AuthApiKey(api_key=\"<YOUR-WEAVIATE-API-KEY>\"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)\n",
    "    additional_headers={\n",
    "        \"X-OpenAI-Api-Key\": os.getenv(\"OPENAI_API_KEY\")\n",
    "    }\n",
    ")\n",
    "\n",
    "# Check if your instance is live and ready\n",
    "# This should return `True`\n",
    "client.is_ready()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ceb14da9",
   "metadata": {},
   "source": [
    "## Generative Search\n",
    "Weaviate offers a [Generative Search OpenAI](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai) module, which generates responses based on the data stored in your Weaviate instance.\n",
    "\n",
    "The way you construct a generative search query is very similar to a standard semantic search query in Weaviate. \n",
    "\n",
    "For example:\n",
    "* search in \"Articles\", \n",
    "* return \"title\", \"content\", \"url\"\n",
    "* look for objects related to \"football clubs\"\n",
    "* limit results to 5 objects\n",
    "\n",
    "```\n",
    "    result = (\n",
    "        client.query\n",
    "        .get(\"Articles\", [\"title\", \"content\", \"url\"])\n",
    "        .with_near_text(\"concepts\": \"football clubs\")\n",
    "        .with_limit(5)\n",
    "        # generative query will go here\n",
    "        .do()\n",
    "    )\n",
    "```\n",
    "\n",
    "Now, you can add `with_generate()` function to apply generative transformation. `with_generate` takes either:\n",
    "- `single_prompt` - to generate a response for each returned object,\n",
    "- `grouped_task` – to generate a single response from all returned objects.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51559251",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generative_search_per_item(query, collection_name):\n",
    "    prompt = \"Summarize in a short tweet the following content: {content}\"\n",
    "\n",
    "    result = (\n",
    "        client.query\n",
    "        .get(collection_name, [\"title\", \"content\", \"url\"])\n",
    "        .with_near_text({ \"concepts\": [query], \"distance\": 0.7 })\n",
    "        .with_limit(5)\n",
    "        .with_generate(single_prompt=prompt)\n",
    "        .do()\n",
    "    )\n",
    "    \n",
    "    # Check for errors\n",
    "    if (\"errors\" in result):\n",
    "        print (\"\\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.\")\n",
    "        raise Exception(result[\"errors\"][0]['message'])\n",
    "    \n",
    "    return result[\"data\"][\"Get\"][collection_name]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4604726",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_result = generative_search_per_item(\"football clubs\", \"Article\")\n",
    "\n",
    "for i, article in enumerate(query_result):\n",
    "    print(f\"{i+1}. { article['title']}\")\n",
    "    print(article['_additional']['generate']['singleResult']) # print generated response\n",
    "    print(\"-----------------------\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "a45ea160",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generative_search_group(query, collection_name):\n",
    "    generateTask = \"Explain what these have in common\"\n",
    "\n",
    "    result = (\n",
    "        client.query\n",
    "        .get(collection_name, [\"title\", \"content\", \"url\"])\n",
    "        .with_near_text({ \"concepts\": [query], \"distance\": 0.7 })\n",
    "        .with_generate(grouped_task=generateTask)\n",
    "        .with_limit(5)\n",
    "        .do()\n",
    "    )\n",
    "    \n",
    "    # Check for errors\n",
    "    if (\"errors\" in result):\n",
    "        print (\"\\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.\")\n",
    "        raise Exception(result[\"errors\"][0]['message'])\n",
    "    \n",
    "    return result[\"data\"][\"Get\"][collection_name]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11e0dad2",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_result = generative_search_group(\"football clubs\", \"Article\")\n",
    "\n",
    "print (query_result[0]['_additional']['generate']['groupedResult'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2007be48",
   "metadata": {},
   "source": [
    "Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
